Analyzing High-Dimensional Data by Subspace Validity

نویسندگان

  • Amihood Amir
  • Reuven Kashi
  • Nathan S. Netanyahu
  • Daniel A. Keim
  • Markus Wawryniuk
چکیده

We are proposing a novel method that makes it possible to analyze high dimensional data with arbitrary shaped projected clusters and high noise levels. At the core of our method lies the idea of subspace validity. We map the data in a way that allows us to test the quality of subspaces using statistical tests. Experimental results, both on synthetic and real data sets, demonstrate the potential of our method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subspace Clustering for Uncertain Data

Analyzing uncertain databases is a challenge in data mining research. Usually, data mining methods rely on precise values. In scenarios where uncertain values occur, e.g. due to noisy sensor readings, these algorithms cannot deliver highquality patterns. Beside uncertainty, data mining methods face another problem: high dimensional data. For finding object groupings with locally relevant dimens...

متن کامل

A Preview on Subspace Clustering of High Dimensional Data

When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the ...

متن کامل

Robust Subspace Approaches for Analyzing Incomplete Synchrophasor Measurements

Synchrophasor measurements can significantly enhance the monitorability of the power grid by revealing the dynamics of grid operation. However, due to high-rate samples collected in large volume, big data challenges emerge to efficiently process the data. The present work advocates robust subspace approaches including robust principal component analysis and subspace clustering, to identify low-...

متن کامل

Locally Adaptive Subspace Regression

Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In ...

متن کامل

DBSC: A Dependency-Based Subspace Clustering Algorithm for High Dimensional Numerical Datasets

We present a novel algorithm called DBSC, which finds subspace clusters in numerical datasets based on the concept of “dependency”. This algorithm uses a depth-first search strategy to find out the maximal subspaces: a new dimension is added to current k-subspace and its validity as a (k 1)-subspace is evaluated. The clusters within those maximal subspaces are mined in a similar fashion as maxi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003